Sampling for Approximate Reduct in very Large Datasets
نویسندگان
چکیده
The rough set theory provides a formal framework for data mining. Reduct is the most important concept in rough set application to data mining. A reduct is the minimal attribute set preserving classification power of original dataset. Finding a reduct is similar to feature selection problem. In this paper, we propose two reduct algorithms. One is based on attribute frequency in discernibility matrix. Another uses similar idea and sampling techniques for large datasets. Empirical analysis shows that both algorithms are efficient.
منابع مشابه
Fast Uncertainty Sampling for Labeling Large E-mail Corpora
One of the biggest challenges in building effective anti-spam solutions is designing systems to defend against the ever-evolving bag of tricks spammers use to defeat them. Because of this, spam filters that work well today may not work well tomorrow. The adversarial nature of the spam problem makes large, up-to-date, and diverse e-mail corpora critical for the development and evaluation of new ...
متن کاملHunches and Sketches: rapid interactive exploration of large datasets through approximate visualisations
Information visualisation presents powerful techniques for data analytics. However, rendering visualisations of big datasets is impractical on commodity hardware. There is increasing interest in approaches where data sampling and probabilistic algorithms are used to support faster processing of large datasets. This approach to approximate computation has not yet paid close attention to the way ...
متن کاملAn interactive framework for spatial joins: a statistical approach to data analysis in GIS
Many Geographic Information Systems (GIS) handle a large volume of geospatial data. Spatial joins over two or more geospatial datasets are very common operations in GIS for data analysis and decision support. However, evaluating spatial joins can be very time intensive due to the size of datasets. In this paper, we propose an interactive framework that provides faster approximate answers of spa...
متن کاملFeature ranking in rough sets
We propose a novel feature ranking technique using discernibility matrix. Discernibility matrix is used in rough set theory for reduct computation. By making use of attribute frequency information in discernibility matrix, we develop a fast feature ranking mechanism. Based on the mechanism, two heuristic reduct computation algorithms are proposed. One is for optimal reduct and the other for app...
متن کاملEnsembles of Classifiers Based on Approximate Reducts
The problem of improving rough set based expert systems by modifying a notion of reduct is discussed. The notion of approximate reduct is introduced, as well as some proposals of quality measure for such a reduct. The complete classifying system based on approximate reducts is presented and discussed. It is proved that the problem of finding optimal set of classifying agents based on approximat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000